303 research outputs found
Compression of Deep Neural Networks on the Fly
Thanks to their state-of-the-art performance, deep neural networks are
increasingly used for object recognition. To achieve these results, they use
millions of parameters to be trained. However, when targeting embedded
applications the size of these models becomes problematic. As a consequence,
their usage on smartphones or other resource limited devices is prohibited. In
this paper we introduce a novel compression method for deep neural networks
that is performed during the learning phase. It consists in adding an extra
regularization term to the cost function of fully-connected layers. We combine
this method with Product Quantization (PQ) of the trained weights for higher
savings in storage consumption. We evaluate our method on two data sets (MNIST
and CIFAR10), on which we achieve significantly larger compression rates than
state-of-the-art methods
Multi-scale Orderless Pooling of Deep Convolutional Activation Features
Deep convolutional neural networks (CNN) have shown their promise as a
universal representation for recognition. However, global CNN activations lack
geometric invariance, which limits their robustness for classification and
matching of highly variable scenes. To improve the invariance of CNN
activations without degrading their discriminative power, this paper presents a
simple but effective scheme called multi-scale orderless pooling (MOP-CNN).
This scheme extracts CNN activations for local patches at multiple scale
levels, performs orderless VLAD pooling of these activations at each level
separately, and concatenates the result. The resulting MOP-CNN representation
can be used as a generic feature for either supervised or unsupervised
recognition tasks, from image classification to instance-level retrieval; it
consistently outperforms global CNN activations without requiring any joint
training of prediction layers for a particular target dataset. In absolute
terms, it achieves state-of-the-art results on the challenging SUN397 and MIT
Indoor Scenes classification datasets, and competitive results on
ILSVRC2012/2013 classification and INRIA Holidays retrieval datasets
Efficient On-the-fly Category Retrieval using ConvNets and GPUs
We investigate the gains in precision and speed, that can be obtained by
using Convolutional Networks (ConvNets) for on-the-fly retrieval - where
classifiers are learnt at run time for a textual query from downloaded images,
and used to rank large image or video datasets.
We make three contributions: (i) we present an evaluation of state-of-the-art
image representations for object category retrieval over standard benchmark
datasets containing 1M+ images; (ii) we show that ConvNets can be used to
obtain features which are incredibly performant, and yet much lower dimensional
than previous state-of-the-art image representations, and that their
dimensionality can be reduced further without loss in performance by
compression using product quantization or binarization. Consequently, features
with the state-of-the-art performance on large-scale datasets of millions of
images can fit in the memory of even a commodity GPU card; (iii) we show that
an SVM classifier can be learnt within a ConvNet framework on a GPU in parallel
with downloading the new training images, allowing for a continuous refinement
of the model as more images become available, and simultaneous training and
ranking. The outcome is an on-the-fly system that significantly outperforms its
predecessors in terms of: precision of retrieval, memory requirements, and
speed, facilitating accurate on-the-fly learning and ranking in under a second
on a single GPU.Comment: Published in proceedings of ACCV 201
Cross-dimensional Weighting for Aggregated Deep Convolutional Features
We propose a simple and straightforward way of creating powerful image
representations via cross-dimensional weighting and aggregation of deep
convolutional neural network layer outputs. We first present a generalized
framework that encompasses a broad family of approaches and includes
cross-dimensional pooling and weighting steps. We then propose specific
non-parametric schemes for both spatial- and channel-wise weighting that boost
the effect of highly active spatial responses and at the same time regulate
burstiness effects. We experiment on different public datasets for image search
and show that our approach outperforms the current state-of-the-art for
approaches based on pre-trained networks. We also provide an easy-to-use, open
source implementation that reproduces our results.Comment: Accepted for publications at the 4th Workshop on Web-scale Vision and
Social Media (VSM), ECCV 201
Surface composition of BaTiO3/SrTiO3(001) films grown by atomic oxygen plasma assisted molecular beam epitaxy
We have investigated the growth of BaTiO3 thin films deposited on pure and 1%
Nb-doped SrTiO3(001) single crystals using atomic oxygen assisted molecular
beam epitaxy (AO-MBE) and dedicated Ba and Ti Knudsen cells. Thicknesses up to
30 nm were investigated for various layer compositions. We demonstrate 2D
growth and epitaxial single crystalline BaTiO3 layers up to 10 nm before
additional 3D features appear; lattice parameter relaxation occurs during the
first few nanometers and is completed at {\guillemotright}10 nm. The presence
of a Ba oxide rich top layer that probably favors 2D growth is evidenced for
well crystallized layers. We show that the Ba oxide rich top layer can be
removed by chemical etching. The present work stresses the importance of
stoichiometry and surface composition of BaTiO3 layers, especially in view of
their integration in devices.Comment: In press in J. Appl. Phy
Exploring Spatial Correlation for Visual Object Retrieval
2013-2014 > Academic research: refereed > Publication in refereed journa
PlaNet - Photo Geolocation with Convolutional Neural Networks
Is it possible to build a system to determine the location where a photo was
taken using just its pixels? In general, the problem seems exceptionally
difficult: it is trivial to construct situations where no location can be
inferred. Yet images often contain informative cues such as landmarks, weather
patterns, vegetation, road markings, and architectural details, which in
combination may allow one to determine an approximate location and occasionally
an exact location. Websites such as GeoGuessr and View from your Window suggest
that humans are relatively good at integrating these cues to geolocate images,
especially en-masse. In computer vision, the photo geolocation problem is
usually approached using image retrieval methods. In contrast, we pose the
problem as one of classification by subdividing the surface of the earth into
thousands of multi-scale geographic cells, and train a deep network using
millions of geotagged images. While previous approaches only recognize
landmarks or perform approximate matching using global image descriptors, our
model is able to use and integrate multiple visible cues. We show that the
resulting model, called PlaNet, outperforms previous approaches and even
attains superhuman levels of accuracy in some cases. Moreover, we extend our
model to photo albums by combining it with a long short-term memory (LSTM)
architecture. By learning to exploit temporal coherence to geolocate uncertain
photos, we demonstrate that this model achieves a 50% performance improvement
over the single-image model
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
Optimising a ranking-based metric, such as Average Precision (AP), is
notoriously challenging due to the fact that it is non-differentiable, and
hence cannot be optimised directly using gradient-descent methods. To this end,
we introduce an objective that optimises instead a smoothed approximation of
AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that
allows for end-to-end training of deep networks with a simple and elegant
implementation. We also present an analysis for why directly optimising the
ranking based metric of AP offers benefits over other deep metric learning
losses. We apply Smooth-AP to standard retrieval benchmarks: Stanford Online
products and VehicleID, and also evaluate on larger-scale datasets: INaturalist
for fine-grained category retrieval, and VGGFace2 and IJB-C for face retrieval.
In all cases, we improve the performance over the state-of-the-art, especially
for larger-scale datasets, thus demonstrating the effectiveness and scalability
of Smooth-AP to real-world scenarios.Comment: Accepted at ECCV 202
- …